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Abstract — The paper describes a receding horizon control design 
framework for continuous-time stochastic nonlinear systems subject to 
probabilistic state constraints. The intention is to derive solutions that are 
implementable in real-time on currently available mobile processors. The 
approach consists of decomposing the problem into designing receding 
horizon reference paths based on the drift component of the system 
dynamics, and then implementing a stochastic optimal controller to allow 
the system to stay close and follow the reference path. In some cases, 
the stochastic optimal controller can be obtained in closed form; in more 
general cases, pre-computed numerical solutions can be implemented in 
real-time without the need for on-line computation. The convergence of 
the closed loop system is established assuming no constraints on control 
inputs, and simulation results are provided to corroborate the theoretical 
predictions. 
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I. Introduction 

The behavior of robotic systems can be uncertain due to a 
variety of reasons, including noise in sensor measurements and 
environmental effects. Such effects are often represented by stochastic 
models (for example, ocean waves (2), wind guests |3| and uneven 
terrain [4]). For nonlinear stochastic systems, existing methods for 
constrained optimal control are too computationally demanding for 
real-time implementation. Specifically, no real-time solution exists for 
continuous-time nonlinear stochastic systems with probabilistic state 
constraints. A receding horizon formulation partially lifts some of the 
computational burden associated with the nonlinear stochastic optimal 
control problem, but current state of the art does not allow real-time 
implementation on processors at the low-end of the frequency scale. 
This paper proposes a solution through a stochastic receding horizon 
formulation that is real-time implementable for nonlinear systems 
of modest dimension, and comes with probabilistic guarantees of 
convergence and state constraint satisfaction. 

Within a predictive control framework, uncertainty can be ac- 
counted for by either approximating sets that bound the system's 
trajectories [5]-[ll| or by stochastic models, with the latter having 
some specific advantages. In particular, while methods based on 
set-bounded models may result in over-conservative designs since 
they plan for the worst case, the use of probabilistic constraints in 
the methods which are based on stochastic models, on the other 
hand, allows for less conservatism. In addition, stochastic model- 
based methods provide some flexibility by allowing one to adjust the 
probability that problem constraints are violated. These two qualities 
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enable stochastic model-based methods to offer solutions where set- 
bounded methods may fail. 

The structure of the dynamics, whenever it can be exploited, can 
greatly facilitate the solution of a model predictive control (MPC) 
problem. When the stochastic dynamics is linear, one may choose 
to apply a Kalman filter or its variants and solve an iterative LQG 
problem 1 1 2| . Alternatively, for linear stochastic systems, the optimal 
control problem under probabilistic constraints is tackled within a 
chance-constrained model predictive control framework JT3i)-[20|. 
Chance-constraint formulations are available for linear discrete time 
systems with Gaussian noise |18|, [20|-[29|. 

While methods exist to enable MPC in linear stochastic systems 
[18], 1 20 1 — 1 29 1 , for most nonlinear systems, the stochastic receding 
horizon optimal control problem can not be solved in real-time. For 
example, a particle filter implementation of chance-constrained model 
predictive control is available for linear systems with probabilistic 
noise [19], 1 30), and it is in principle applicable to nonlinear systems 
too. However, the approximate solutions obtained using this method 
depend on the number of particles, and convergence is achieved 
after a sufficiently large number of particles is used. Alternative 
(discrete-time) methods combine a hybrid density filter with dynamic 
programming |31| , the latter being the natural discrete formulation 
of the optimal control problem. In the hybrid systems literature 
we find reach-avoid formulations of this problem (32), |33| , in 
which the indicator function of hitting goal or obstacle sets appears 
in the cost of the optimal control problem (similarly to what is 
done in this paper). Computational complexity currently limits the 
application of these methods to systems with up to three states |33| , 
while requirements for real-time implementation are not imposed. 
Invariably, computational complexity and accuracy issues surface in 
all discrete-time and space methods, either primarily due to the use 
of filters, or simply due to the resolution required in the time or 
state-space domains. 

Time and space-discretization may be avoided if the problem is 
formulated in continuous space and time. Continuous-time solutions 
to stochastic optimal control problems are available for systems 
affine in control and with state independent and time invariant 
control transition matrix, and it is based on path integrals |34| . 
A path integral is essentially the solution to a Hamilton- Jacobi- 
Bellman (HJB) equation, obtained after the application of a particular 
transformation [331. In certain cases, the path integral is computable 
numerically using Laplace approximations or Monte Carlo sampling. 
Different applications of path-integral stochastic optimal control have 
been explored, such as reinforcement learning [36], variable stiffness 
control (equivalent to automatic tuning of PD gains) [37] and risk 
sensitive control |38|. The main issue with path integrals is that for 
most nonlinear systems the solution is computationally demanding 
and can not be obtained in real-time on existing processors. This 
limits the application of path integral to real-time receding horizon 
control on miniature robots. 

The main contribution of this paper is to synthesize a real-time 
design for stochastic (receding horizon) control, following an exit time 
1 39 1 formulation of the stochastic optimal control problem, instead of 



one based on path integrals. The proposed formulation yields a time 
invariant control vector field, which is optimal in terms of actuation 
utilization. What enables real-time implementation is the fact that the 
field can be computed off-line and used on-line in a recursive manner. 
The formulation is based on a combination of deterministic planning 
with stochastic optimal control, where successive locally optimal 
stochastic controls are used to steer a system along a deterministic 
receding horizon reference trajectory, which is conceptually similar 
to Differential Dynamic Programming (40) and iterative LQG JH) . 
While such a two-level planning and control strategies has been 
used successfully in a deterministic setting |41|-|43| there is no 
stochastic analog yet except our own work (T). Due to the explicit 
consideration of stochasticity, the proposed method offers almost 
sure (with probability one) guarantees of collision avoidance and 
convergence to a desired region, which are elusive in a deterministic 
setting. 

The work presented in this paper is organized in the following 
way. Section [n] states the problem formally followed by an intuitive 
explanation of our approach in section [Til] Section [IV] explains 
a stochastic optimal control design, which is at the heart of our 
framework. Section [V] presents the design of the stochastic receding 
horizon framework and discusses the existence of solutions for our 
closed loop system. The convergence properties of the resulting 
stochastic hybrid system are established in Section [yT] and the issue 
of input saturation is brought up. Section [VIl| offers examples of linear 
and nonlinear systems, presents simulation results for the cases of 
unbounded and bounded inputs, and discusses computation methods 
for complex nonlinear stochastic systems. We conclude in Section 

Ens 

II. Problem Statement 



For general nonlinear systems, global analytic solutions to the 
above stochastic optimization problem are not available. Numerical 
solutions can be obtained, but depending on the size of the dynamics 
and the constraints of the problem, the computation cost can be too 
high for real-time implementation on processors on the lower side of 
the frequency scale. This limitation motivates us to seek sub-optimal 
solutions to the above problem by solving the following relaxation 
instead. 




Fig. 1 . Illustration of the modified problem statement. Obtain the solution V 
to a deterministic optimal control problem using the drift part of the dynamics 
(solid thick curve), and then maintain the full stochastic dynamics (thin sample 
path) i?-close to that reference solution with accuracy e. 



Consider an uncertain dynamical system evolving within an open 
bounded region 5 C W 1 . Within 5, there is a closed set O C S which 
represents forbidden areas (obstacles). In that sense, the system can 
safely evolve only in the free workspace V = S\0. 

The dynamics of the system is given in the form of a stochastic 
differential equation (SDE) 

dq(t) = b (q(t)) dt + G (q(t)) [u (q(t)) dt 

+ E (q(t)) dW(t)] , q(0) = q (1) 

where q G R™ is the state, b : R™ -> R n is the drift term, G ■ 
R n -> R m is the matrix of control vector fields, u : 1" -> R m is 
the control input, and E : R n -> E mxm is the diffusion term. Let 
W = {W(t),(Ft ■ < t < oo} be an m-dimensional Wiener process 
on the probability space (fl, T, P), where fi is the sample space, T 
is a cr-algebra on Q, P is the probability measure and {J-~t : t > 0} is 
the filtration (i.e. an increasing family of sub-cr-alsebras of J 7 ) that 
is right continuous and J-~o contains all P-null setsrl 

In a typical stochastic optimal control problem, one has to find 
a control sequence to steer the dynamics to a desired configuration, 
while minimizing the cost functional 



V(q,u) 



minE 

u(t) 



r oo - 

/ L{q{s),u{s))ds q(0) 
Jo 



subject to P[q(t) G O] = 0, Vt 

where the function L is the incremental cost, assumed positive 
definite. 

'The justification and the detailed definition for these mathematical con- 
structions can be found in |44|. 



Problem 1 (Modified Problem Statement): Find a sequence of 
feedback control laws {ui(q)}fL 1 for QJ, such that if q*(t) is the 
solution of the system^ 

4 = 6(q) + G(q>(q) (2) 

for a u*(t) that minimizes the functional 

J(q,it) = mm / L(q(s),u(s)) ds (3) 

" Jo 

q(0) = q . 



subject to inf \\q(t) — z\\ > R > 2e > 

z£O,t>0 " v ' ' 



where, R and e are positive constants. If T = {7 G R" j 3t G R; 7 = 
q*(£)} denotes the locus (path) of that solution, then for a given 



selection {7i} i=1 C T of iV points on T such that inf; 



Il7»-7jrll > 



2e, sup iJ ||7»— 7j || < R—2e and qjy = 0, the application of {it;(q)} 
to ([T} results in sample paths q(t) that achieve 

(i) P [inf 7er ||q(t) - 7|| < R] = 1, Vt > (almost-sure safety); 

(ii) P [3t s < 00 : ||qjv — q(^a)|| < e] = 1 (almost-sure conver- 
gence with accuracy e > 0); 

(iii) E i L(q(s), iti(s)) ds + $(q(fi))J is minimized, where 
ti-j and ti are the first times q(i) enters an e-neighborhood of 
7i_r and 74, respectively, and <D(q) : K n — > E+ is a terminal 
cost function (local optimality). 

Even in this form, the problem does not lend itself to efficiently 
computed solutions because of the nonlinear infinite-horizon optimal 
control problem that needs to be solved to obtain T. For this reason, 

2 Assume that the dimension of the controllability distribution is of rank n. 



the solution (q*,u*) of the deterministic optimal control problem will 
be approximated by the solution of the receding horizon problem 

J T {q,u rh ) = mm f L(z{s),u{s)) ds + Q(z(T)) (4a) 
«(*) Jo 

subject to z = b(z) + G(z)u , z(0) = q (4b) 

where T is the prediction horizon of the optimization, function L is 
the same as in j3j, and Q : R n —¥ R+ is the terminal cost which 
approximates the truncated tail of the integral in {3}. The idea behind 
a receding horizon optimization strategy is that one solves the finite 
horizon optimal control problem and obtains a control law U r h(t) 
computed for z(0) = q(io)- Control law u r h(t) is applied on (|2j 
for the time interval [to,£i], ti < to + T, during which time a new 
control law is computed for z(0) = q(ti), with q(ti) predicted based 
on {2). At time t = t\, the control law is updated and the process is 
repeated. It is known [45] that if Q (z) is a control Lyapunov function 
for d4bk, and 



.{Q(x) + L(z,u)}<-ti(\\z\\) 



(5) 



where 77 is a class-/C function of ||z||, then application of U r h(t) 
results in ||z| — > asymptotically with time. We assume that Q 
is a control Lyapunov function for j4b| > here as well, and that there 
exists a positive definite function r\ satisfying |5}. In the our modified 
problem setting, {iti(q)} takes the place of U r h{t) and q(U) = {"fi}- 

III. An intuitive example 

Consider a robot moving in a two-dimensional space, and described 
by single integrator dynamics perturbed by stochastic noise: 



dq(t) = w(q(t)) dt + dW(t); q(0) = qo 



(6) 



where q = [a; y] T is the state vector, it(q) is the control input and 
W(t) is a two-dimensional Wiener process. The objective is to find 
a feedback control law u(q) to drive the system e-close to the origin, 
while avoiding the boundary of a circle with radius R, centered at 
the origin. 

An obvious control strategy is to just steer the system along a 
direction toward the origin. A normalized vector pointing to the origin 
from the current state q is — . To satisfy the state constraints, the 
system should be forced away from the circle with radius R. One 
way to achieve this is by weighting the control input by a factor 
„ -m I, . This results in 



t (q) = -- 



(7) 



(fl-llqll)llq 

It turns out, this intuitive design yields a stochastic control law which 
is actually optimal. In fact, jTJ minimizes the cost 

1 



V(q,u) = E 
where 



-u T (q(s)) u(q(«)) ds + *(q(T)) q(0) = q 



$(q(r)) 



on||q(r)l|=s 
00 on ||q(r) | = R 



and r is the first time the state hits either the circle with radius e or 
that with radius R. Control law (7} guarantees that the system avoids 
the i?-radius circle boundary with probability one, and consequently 
hits the e-radius circle with probability one, because it is known that it 
almost surely exits the domain {s < ||q|| < R} somewhere (see |44| 
Lemma 7.4], and the discussion in the section that follows). Sample 
paths for the given controller are shown in Fig. |2(a)| for different 
initial conditions. 




(a) Sample Paths 
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(b) Recursive Execution 

Fig. 2. The sto chas tic optimal controller |(a)| Sample paths for a single 
integrator in 2D. |(b)| The trajectory resulting from implementation of the 
stochastic optimal controller in a receding horizon framework. 



Assume now that as soon as the system hits the circle of radius 
e around the origin, a coordinate transformation occurs which shifts 
the origin to a point within distance R from its prior location. Then 
the same controller can be reapplied to drive the system to a e- 
neighborhood of the new origin. An iterative scheme based on this 
idea can be used to steer the system from point A to point B in 
a receding horizon manner. A sample trajectory resulting from an 
implementation of such a receding horizon controller is shown in 
Fig. [2(b)] 

While the design of the controller |7]( that enables convergence to 
way-points is simple for the case of the stochastic single integrator 
of |6j, is not the case for general stochastic nonlinear systems. In 
following sections, we outline a mathematical framework that allows 



the computation of receding horizon controllers for more complex 
stochastic nonlinear systems. 

IV. Stochastic Optimal Control with Exit Constraints 

In this section we design stochastic optimal controllers with 
exit constraints. These controllers guarantee convergence to a given 
set, and satisfaction of state constraint, both with probability one. 
Consider the stochastic system l|T]l 

dq(t) = 6(q(t)) dt + G(q(t)) [u(q(t)) dt 

+E(q(t)) dW(t)] , q(0) = q 

which evolves within a bounded domain PC? with a C 2 boundary 
(9D and closure denoted T>. Assume that b(q), G(q), E(q), and 
E _1 (q) are bounded and Lipschitz continuous on T>. The objective 
is to find the control u(q) that yields 



V(q,t) = minE 

u(q) 



L(q(s), u(s)) ds 



+$(q(tAr B )) q(0) 



(8) 



where tt> is the first exit time from the domain T>. (Notation t A Tz> 
is standard for min(i, T£>).) The incremental cost L(q, u) in {8} is 
defined as ^ 

L(q,lt) = l(q,t) + -u J a _1 (q)« 

where a(q) = E(q)E T (q). We impose an admissibility condition 
that there exist a set of control inputs u q £ U q such that for all 
initial conditions q and control inputs it q , the cost V(q, tt>) < oo. 

The HJB equation associated with JSl is 



mm < 

u(q) 



{AV(q,t) + L(q(t),u(t))} =0 (9) 
where A the second-order partial differential operator 

A = wt +E (q)+ G ^ (<0«i 00) + 5 E E (<0 



<9q.,<9q fc 



Equation |9]( is written in matrix form as follows 

min\d t V(q,t) + d q V(q,t) b{q) + q V T (q, t) G(q)«(q) 

ll(q) I 



+ 2 tr {^(q>*) G(q)E(q) E T (q)G T (q)} 



+ 



Kq,t) + ^ T (q)a(q)- 1 u(q)} =0 



where tr stands for trace. The optimal control law u* G U c[ that 
solves (9} is then given as 



u*(q) = -a(q)G'(q)d q V(q,t) 
Substituting <(To]l in ((9} yields 

a t y( q ,t) + a q \/ T (q,t) &(q) 
i 



(10) 



8 q V'(q,t) G(q)a(q) G T (q) 8 q V(q,t) 



1 



+ 2 tr {^^(q-*) G (q) E (q) ST (q) G T ( q )} 



+ i(q,t) = 0. (11) 



Using the logarithmic transformation 1 35 1 

V(q,t) = -lo g5 (q) , 



and with substitution in ([TTJ we get 

- d t g(q, t) = -l(q, t) g(q, t) + d q g J (q, t) b(q) 

+ ^tr{a qq5 (q,t) G(q) E(q) E T (q) G T (q) } = (12) 
with boundary condition 

j(q,fATp) = exp(-$(q(iAri,))j, q £ dV . 

Analytic solutions of the above partial differential equation (PDE) 
are generally not possible for complex nonlinear systems. However, 
the Feynman-Kac formula [ 44 1 relates a certain PDE with an equiv- 
alent SDE, and facilitates the numerical solution of the PDE through 
numerical simulation of the SDE. Using the FeynmanTSac formula 
(44), the solution of l |12[ l takes the form 



g(q) = E 



= E 



<?(q, t A tt>) exp 



[Jo 



exp 



(-<&(£(* A T D ))) 



l(q,s)ds C(0) = q 



ox V \ I Z(q,s)ds ] | C(0) 
(13) 



where £(t) is the Markov process 

dC(t) = &(C(*)) dt + G(q(t)) E(C(t)) dW(t) (14) 

evolving on the same bounded open set T> C R™. 

Stochastic Optimal Control with Exit Constraints: Under the 
assumption 

minaii(q) > (15) 

for some 1 < I < m, one can show that E[r-p | q(0) = qo] < oo, 
Vqo G T> j44| Lemma 7.4]. This means that the system will escape 
the domain T> in finite time with probability one. The assumption 
that E and E _1 are bounded, ensures satisfaction of (15) . 

A guarantee that the system does not exit from a specific portion 
of the boundary can be obtained by imposing an infinite penalty for 
touching that surface. Consider a partition of the boundary &D in the 
form N C dV\ M = dV\J\f. Then choose $ as 



and 



X\A = 



-00 • Xm_ ; 



on M 

1 on M 



Assuming that l(q, i) = and letting t — > 00, the resulting parabolic 
PDE (12) gives rise to the Dirichlet problem 

d q9 T (q) b(q) + ^tr{9 qq5 (q) G(q) E(q) E T (q) G T (q)} = (16) 

's(q(Tz>))=l q(r-v)eN 
j{<l(TT>))=0 q{rv)eM 

Then (13) suggests that g(q) is in fact the probability that the sample 
path of (14) from q hits boundary M before M. Function g(q) takes 
the form 

g(q) = P [CM G N I C(0) = q] . (17) 

and is the Markov process (14) . Now if the admissibility 

condition is satisfied then the optimal control with infinite penalty 
on exit boundary is equivalent to a constraint (see (39)), 

P[q(rx>) £ M I q(0) = q] = 



Remark 1: The computation of control input < | 1 0| > requires g(q), 
which can be found by either by solving \\2\ analytically, or 
numerically simulating l |14| ( and computing As $ imposes an 
infinite penalty on state trajectories that exit through M, the above 
construction forces the system to exit through M while avoiding 
M with probability one. The problem of stochastic optimal control 
with terminal cost at exit time is discussed in [35], while a specific 
problem of exit constraints was discussed in |39). The latter reference 
also shows that imposing an exit constraint is equivalent to having 
infinite penalty on exit location used in this section. We use these two 
results and thus by defining M to be the boundary of state constraint 
regions, we achieve the guarantees that state constraints are satisfied, 
and convergence to a desired region is achieved in finite time. 

V. Stochastic Receding Horizon Control Design 

After the presentation of the continuous-time constrained stochastic 
optimal control formulation in its general setting, we proceed with 
the description of the implementation of these techniques inside the 
receding horizon framework that was outlined in the example of 
Section [ill] Out of this process emerges a simple, special case of 
a gereral stochastic hybrid system (GSHS), for which the existence 
of solutions has been established in literature (46). The section 
concludes with an examination of the closed loop stability and 
convergence properties of this simplified GSHS, and a discussion on 
how input saturation affects these properties. 

A. Deterministic Planning 

We begin by computing a receding horizon path using \4b\ 

q = &(q) + G(q>(q) . 

Let q T (t) : [to, to + T] — > R n be the trajectory that, for a prediction 
horizon T, minimizes the cost functional 



J T (q,u)=min / L(q(s),u(s)) ds + Q(q(t + T)) 



subject to inf ||q(t) — 1 1 > R 
te[t ,t +T] 



q(to) 



with functions L and Q as in (4). Define a receding horizon path as 
T T = {7 G K n I 3t G [*o, to +T] : 7 = q T {t)} . (18) 

Here we adopt the approach of [47] to obtain an approximation 
of q^ and consequently compute Ft- The latter, however, can also 
be obtained through an array of alternative methodologies, including 
potential field methods |48| , rapidly exploring random trees RRTs 
(49), or cell decomposition methods (50) . 

B. Way-point Generation 

Let the closed ball of radius e centered at a point 7 is denoted 

£> 7 (e) = {q : ||q — 7|| < e}, and its complement, B 7 (e). Now 
consider a sequence of points {yi}fL £ Ft with 70 := q(to) and 
7jv := q T (t + T), satisfying 

max {0(a)}- mm {Q(b)} < -r?(||7i-i||) , (19) 

where 7 is the positive definite function in (5). Define domains T>i, 
for i = 1, . . . , N, such that |J; 2?* H O = and 



Decompose the boundaries of those domains as follows (see Fig. [3}: 

Mi = dVi n B~ u (s) (21) 
Mi = &Di\Mi (22) 

The domains T>i are defined such that Mi is non-empty for all i. 




Fig. 3. Illustration of the local domains T>i (hashed region). Also shown are 
the receding horizon path (continuous curve), the way-points defined by 
the sequence {7^} (crosses), the obstacles Oj (solid disks), the boundaries 
Mi (dashed blue inner boundary) and Mi (dashed red outer boundary). 



C. Stochastic optimal controllers 

The system state is a Markov process q(t) that evolves between 
way-points according to the SDE 

dq(t) = b(q(t)) di+G(q(t)) [«i(q(t)) di + E(q(i)) dW(t)] (23) 

where E(q), b(q), G(q), E _1 (q) satisfy the requirements of Sec- 
tion [IV] and together with Ui, are all bounded in T>i. The latter is the 
control input responsible for taking the state from Mi-i to Mi while 
avoiding Mi. Let be the first time instant when q(t) G Mi-i. 

When (23) under u t hits Mi at some time t it it undergoes a forced 
transition with Ui switching to Ui+x, and the switch occurs upon 
the state hitting a part of the boundary Mi. Control law Ui gives a 
solution to the stochastic optimal control problem 



-uj (q(s)) a 1 (q(s)) m (q(s)) ds 



+$(q(ti)) q(ti_i) = q 



=-V(q) (24) 



Notice that by setting now the terminal time to ti allows the value 
function V to be time-invariant. We define the exit time for the 
process driven by Ui to be r,: — ti — Function $ is again 

chosen in a way that it imposes infinite on the state hitting Mi. 
Similarly to the analysis of Section [IV] the solution of (24} is 

V(q) = -logff(q) 

where g(q) = P [C(t"i) G Mi j q(ti_i) = q] , and the optimal control 
law for q £ T>i is 



u i *(q) = -o(q) G T (q)d q V(q) 



(25) 



H 74 _ 1 (e)cX> i cB;.(e) 



(20) 



When applied, u*{q) satisfies the following probabilistic condi- 
tions: 

E[r< I q(ti_i) = q] < 00 (26) 
P[q(t<) G Mi I q(ti_i) =q] =0 
^ P[q(ti) G Mi I q(ti-i) = q] = 1 ■ (27) 

Condition (26) translates into the process q(t) exiting T>i in finite 
time with probability one which is guaranteed by assumption \\5\ . 
Condition (27) is equivalent to saying that the process q(t) reaches an 



e-neighborhood of way-point 7, with probability one, before violating 
any state constraints (see |39|). 

Given a receding horizon path Ft seeded with a sequence of way- 
points {7i}j=o> the process of transitioning from way-point 7»_i to 
way-point 7* under \25\ is repeated. By the time a new way-point is 
reached, the path IV has been recomputed in a receding horizon 
manner, and the way-point sequence {-yi}fL Q redefined with the 
initial element 70 being the way-point just reached. What is important 
for real-time implementation is that for predetermined domains T>i, 
(25} can be precomputed off-line, numerically in general but also 
analytically in special cases where b, G and E are such that the 
boundary value problem for PDE {12} can be solved explicitly. 



D. The Resulting Stochastic Hybrid System 

Closing the loop around (23} by means of a receding horizon 
strategy gives rise to a switched stochastic hybrid system, where 
switching is due to w, and occurs as a forced transition whenever 
q(£) hits a set Mi. The hybrid state here is just (i, q) where q G R n 
and i G {0, 1, 2, N} =: X are the continuous and discrete states, 
respectively. This system can be classified as a GSHS, a general 
modeling framework of which is described in |46|; however, it is 
a very simplified version of the the general definition of (46), which 
can be adequately described by defining only the following three 
components: the continuous dynamics, the discrete dynamics, and 
the reset condition. 

Continuous Dynamics: The continuous state q(t) evolves ac- 
cording to the SDE (23} 

dq(t) = b(q(t)) dt + G(q(t)) [«(», q(i)) dt + E(q(i)) dW(t)] 

(28) 

where we have just replaced Uj(q(t)J with u(i,q(t)) to emphasize 
the explicit dependence of the control input on the discrete state i, 
making it a function of the hybrid state (i, q):M:lxR"-> R m . The 
drift b and diffusion E terms, along with G, are assumed independent 
of i. When in discrete state i, the domain of the continuous variable 
q(t) is T»i. 

Discrete Dynamics: The (single) discrete state i evolves by 
means of state-triggered forced transitions, which occur each time the 
continuous state q hits a guard. In this case the guard is a function 
from i to R", sending i Mi. The time at which the transition 
is triggered is called stopping time and it is the first time instant 
ti = inf{t > ti-\ I q(t) ^ T>i}. Then the discrete state changes 
according to the following — in fact, deterministic — rule: 



VI. Convergence and Stability Properties 

This section presents a proposition that establishes the finite-time 
convergence properties of the closed loop system to a neighborhood 
of the origin. 



(i + 1 I i,q(U-i) = q) = 



1 q(ti) G Mi 
otherwise . 



Note that due to the set of discrete states being finite, and the discrete 
transition map being a bijection, there can only be a finite number 
of discrete transitions and the system cannot exhibit Zeno behavior. 

Reset Condition: During discrete transitions, continuous states 
are not reset. Essentially, the reset map for the continuous states is 
simply the identity. 

The solution of (28} over i = 1, . . . , N, is a collection of Markov 
processes truncated at (their) exit time, which can be represented as 
a Markov string. A Markov string is a hybrid state jump Markov 
process (46). Given the existence of solutions for each SDE (23} for 
fixed i (see |39[ for details), and due to the finiteness of the set 
of discrete states, the solutions for the closed loop stochastic hybrid 
system are well defined [46]. 



Proposition 1: Consider the switched stochastic system (23} in an 
open bounded domain S C R n , where i G X is the switching index, 
and W(t) is a Wiener process. Let Q(q) be a C 2 , positive definite 
function in the closure of a bounded domain S which contains the 
origin. If for every solution q(i) of the stochastic switched system 
there exist 

(i) bounded domains T>i that satisfy (20)-(22), and 

(ii) a class-/C function 77 on S together with a sequence of points 
H}to G S satisfying (19}, 



then the closed-loop switched stochastic system (23}-(25) converges 
to an e-neighborhood of origin in finite time. 

Proof: It is known (45) that a receding horizon strategy w r h(i) 
applied on (4} yields a trajectory q*(t) satisfying linit-j-oo q*(t) — > 
0. Hence, with sufficiently large T < 00, one can find a path Ft 
such that Ft H Bo(e) 7^ 0. Moreover, condition (5} ensures that for 
any q(to) G S, the system will remain within an open bounded set 
containing the level set of q(io)- This means that for a sufficiently 
large T, the path Ft intersects an e-neighborhood of the origin and 
remains bounded. Given that this set is bounded, one can only cover 
it with a finite number of non-overlapping balls with radius e > 0. 
Hence, for sufficiently large T < 00, there is a finite number of way- 
points TV that satisfy condition {19} with 7 at at the origin. Then, by 
induction it is shown in a straightforward way that the system reaches 
an e-neighborhood of the origin in finite time. 

To this end, set q(io) = 70 » construct a path Ft of finite length 
according to (4), and select a way-point 71 according to {19}. Given 
that bounded domain T>\ satisfies (20}-(22}, the application of control 
law (25} ensures that for all q(t ) G Mo, P{q(*i) G M | q(*o)} = 1, 
that is, the state at time t\ is in Mi almost surely (see Section |IV| 
and 1 39 1). Condition {T5} ensures that the time that this happens is 
finite. 

Now, let us assume that a controller Uk (q) was applied iteratively, 
and at some time tk, state q(tfc) G Mk- As Mk C Dfc+i and given 
\20\ , there exists a controller itfc+i(q) to steer the state to the next 
way-point 7fe+i. Given now that T> k +i also satisfies (20}-(22}, the 
law (25} gives P{q(t fc+ i) G M k+ i\ q(t k )} = 1 with E[t k+1 ] < 00. 
Inductively, since Mn '■= dT>N n Bo(e), the proof is completed. ■ 



A. Convergence under bounded inputs 

The control law Ui(q) = — a(q) G T (q) <9 q {— log g(q)} may re- 
quire large inputs near the boundary Mi, since g(q) — > there. This 
can be problematic from an implementation standpoint. When these 
inputs saturate at some ||u(q)|| m ax, the control law that is practically 
implemented is rather approximated smoothly by 

w»(q) = -||u(q)||max • tanh(o(q) G T (q) 9 q V(q, t)) . 

The problem is that bounded inputs cannot force exit at Mi with 
probability one. The probability of success in exiting when bounded 
inputs are applied can be computed |51| , but there there is always 
a nonzero probability that the system will exit from Mi instead of 
Mi. Neither convergence to origin nor constraint satisfaction can be 
guaranteed almost surely. 

To recover convergence under bounded inputs, we propose a re- 
covery strategy that uses repeatedly a controller precomputed offline, 



which steers the system back inside the domain T>i. The receding 
horizon control can be re-initiated after the state is re-enters T>i. This 
recovery controller is not different from \25\ , and its use is illustrated 
in an example in Section [yn] In the absence of obstacles, and with 
infinitely large outer domain, the guarantee of convergence can thus 
be recovered even with bounded inputs. 



VII. Examples 

We present two different examples to demonstrate application of 
our control design. In the first example the stochastic optimal control 
law can be computed explicitly, and simulation results are presented 
to demonstrate its function. The effect of input saturation is also 
investigated. The second example involves a nonlinear system, where 
the stochastic optimal control laws can not be computed explicitly. 
There, we show how the application of the Feynman-Kac formula 
offers numerical controller designs, and we present the results through 
representative plots. 



A. The Stochastic Single Integrator 

Problem formulation: Consider the system {23} with the drift 
term b(q) = and G(q) is identity. This simple drift-less system can 
be described as a two-dimensional single integrator with stochastic 
uncertainty as 

dq(t) = Ui (q(t))dt + Z(q(t))dW(t); q(0) = q (29) 

where q = [x y] 1 is the state and W(t) is a 2-dimensional Wiener 
process. The objective is to find control inputs tti(q(t)) to drive 
the system to origin, using minimal inputs, avoiding obstacles, and 
moving along paths of minimal length to its destination. Here the 
system's workspace is a ball of radius po, containing M spherical 
obstacles with radii pj and centers q^ , j — 1, 2, . . . , M. 

Deterministic Path Planning: The first step is to find a reference 
trajectory for {29} ignoring noise. The nominal dynamics is just 
q = u(q(t)). We use the approach of [47 1 (other methods are also 
possible) to find a continuous trajectory minimizing a finite-horizon 
cost 

J(q,u) = f T { Cl \\u(s)\\ 2 + c 2 \\q(s)f}ds + Q(q(T)) 
Jo 

where T is the prediction horizon and c\ and C2 are arbitrary positive 
constants. The terminal cost Q(q(T)) is selected as a navigation 
function [52| defined as 



Q(q) = 



l|q|P + /3(q) 



(30) 



where k £ N is a sufficiently large positive integer. In q30j, the 
function /3 : V — > [0, oo) encodes the location and size of obstacles 
and is expressed as 

M 

/?=n& 



3=0 



with Po = Po - llqf and ft = ||q - qj || 2 - p 2 , for j = 1 



,M. 



Assume that the outcome of this procedure is an obstacle-free 
continuous state trajectory q*(t) £ V, and the resulting path is 

r 4 { 7 G E 2 I 3t e K;7 = q*(i)} . 



Way-point Generation: There exist control way-points 
{^i}fL Q g r, such that 70 = q(to), and jn = q*(T). Define the 
sets S 7i (e) = {q £ V ■ ||q — 7;|| < e} and denote their boundary 
9B 7i (e). The waypoints we select are chosen to satisfy the following 
constraint: 

max {0(a)}- min {Q(b)} < -r?(||7i-i II) (31) 

a6B 7i _ 1 (e) ij68 7i ( £ ) 

||T*-i — Till > 2e (32) 
Ri<xmn{\\ji-z\\,zeO}, R4 - 2e > || 7i _i - 7i|| (33) 

where e and Ri are positive constants. The above constraints also help 
determine the radius Ri, which is the outer radius of the domain of 
the continuous state T>i. There is no unique solution for Ri and one 
can specify an upper and lower bounds on Ri. 

The local domains T>i are now defined as 

■Di±B^(Ri)\B^(E) 
aO I = aS 7l (e)u9Z3 7l (i? I ) . 

where M = 9B 7i (e) and Mi = dB lz (Ri). Conditions {38}-{39} 
imply that 

B^(Ri)nO = 0; B 7i _ x (e) cA ,Vi . 

Stochastic optimal controller: The control input Ui(q(ti)) for 
J29l is constructed as shown in Section IIVI It achieves 



V(q) = minE 
where 



u(q(s)) T u(q(s)) ds + *(q(ti)) q(t l _ 1 ) = q 



$ = +00 • ; Xjv\i 



on Mi 



1 on M i 
The optimal control law is 

U*(q) = -a(q) ■ <9 q V(q) 

where a(q) = E(q)E T (q), V(q) = — log(;(q), and g(q) is the 
solution of the PDE 



id 2 d 2 . 

9 = 
fl = l 



Function g(q) has an analytic expression: 

Ri - ||q-7i|| 



in Vi 

on Mi 
on Mi 



ff(q) = 

which suggests a value function 
V{q) = -\og 
and a control law of the form 

Ui(q) = -a(q) 



Ri -£ 



Ri ^ ||q-7i| 
Ri - e 



q - a 



Rr - ||q- 7i||)l|q- H 



(34) 



Control input Wj(q) switches to Uj+i(q) upon hitting the boundary 
Mi for i = 1, 2, . . . until the state is in e-neighborhood of the goal. 



2 ) with the overall bounded domain 
||q|| < 10}. The initial condition is 



Problem instantiation and simulation results: Simulations were 
performed (taking q 6 
being S = {q G R 2 
qo = [x,y] J — [— 3.0, — 3.0] T . The goal is to drive the system 
to the origin. The workspace contains two obstacles of radius 0.2 
at coordinates [-3.0, -1.0] T and [-2.0, -2.0] T . Matrix E(q) is the 
2x2 identity, and Ri is chosen to satisfy ||7i-i — 7»|| < Ri — 2e 
and min{||7i — z\\,z G O} > Ri with e = 0.1. A navigation 
function Q(q) is constructed on R 2 and a trajectory for q = u(q(t)) 
is generated based on |47|. The simulation of the complete algorithm 
is shown in the Fig. [4] The navigation function is depicted in the form 
of a contour plot, while the discrete way-points are center of filled 
(red) circles. The boundaries Mi are chosen based on (38}- (33} and 
are marked in the figure by dotted black circles. 



The effect of input saturation: The following controller is a 
saturated version of (1341: 
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Fig. 4. Simulation of a stochastic receding horizon control for a stochastic 
single integrator moving in a two obstacle environment. The simulation 
was generated using Euler-Maruyama method implemented in MATLAB® 
Econometrics toolbox. |(a)| The blue trajectory shows the actual stochastic 
path taken by the system. The initial condition of the system is marked with 
a black square. The black dashed circles represents the boundary Mi while 
red disks represent the region around way-points 7; with its boundary Mi 
and the blue circle is the boundary around the final goal, and |(b)| Norm of 
the control inputs for the entire simulation with unbounded inputs. 



tanh 



(35) 



(Ri - ||q-7»ll)lk-7»l 

Figure [3] shows a sample path for the bounded input case, and 
quantifies the norm of the inputs used. 




(a) Stochastic Path 





(b) Inputs 

Fig. 5. Simulation of a stochastic receding horizon control for a stochastic 
single integrator moving in a two obstacle environment with bounded inputs. 
The system \29\ was simulated with bounded inputs (35) and |it(q)|m a x = 5- 
|(a)| The blue trajectory shows the actual stochastic path taken by the system. 
The initial condition of the system was [—3, — 3] T represented by a square. 
The black dashed circles represent the boundary Aii while red disks represent 
the region around way-points 7; with its boundary Mi and the blue circle is 
the boundary around the final goal, and |(b)| Norm of the saturated control 
inputs. Each component of the input was saturated at |«(q)| ma x = 5 using 
tanh function. 



As discussed earlier, bounded inputs (35} will not result in success 
with probability one (i.e. the probability of first hitting dB li (e)) and 
the probability of success for each local controller can be computed 
according to |51| . Figure [6] represents the probability of hitting the 
goal boundary dB li (e), before exiting the domain elsewhere for any 
given initial condition. It can be seen that there is always a nonzero 
probability that the system exits from dB li {Ri) instead of dB li (e) 
under bounded inputs, and this probability becomes higher for initial 
conditions closer to dB~ ti (Ri). 




Fig. 6. The probability of first hitting the goal boundary for the system {6} 
using bounded input (35) with |«(q)|max = 5. The probability of reaching 
the desired boundary for each local controller can be computed according to 



In this section we demonstrate a solution approach that is based on 
the Feynman-Kac formula. 

Problem formulation: Consider a mobile robot with three omni- 
directional wheels (Fig. [8). In Fig. [8] x, y mark the position, with 
respect an inertial X—Y frame, of the local, body-fixed frame X m - 
Y m , The orientation of the local frame with respect to X—Y is given 
by angle 6. The dynamical system modeling the robot has as state 
the vector q = [x,y,0] J . The input to the system is a vector u = 
[Ui, Ui, [/a] T of the linear velocities of the three wheels, denoted U\, 
U2, Us, respectively. Stochastic noise affects all three coordinates x, y 
and 8. The equations of motion for such a system can be represented 
by the following SDE 
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(36) 



To recover convergence under bounded inputs, we implement 
the recovery strategy. The implementation is shown in Fig. [7] We 
observe that the probability of convergence with recovery strategy 
can be one in absence of obstacles and sufficiently (infinitely) large 
outer boundary. In the presence of obstacles, the computation of the 
probability of convergence can only be approximated by a numerical 
estimation for finite way-pointsrj 




■0.5 0.5 



x [»| 

Fig. 7. An example of the recovery strategy. The blue trajectory is evolution 
under a controller «i(q) which fails and the system exits at dB~ li (Ri)- The 
dotted circles form domain of the recovery controller and the system is driven 
back inside the domain T>i. 



B. A Nonlinear System 

Finding a solution to the PDE is central to the proposed control 
design. In Section [VII- A| such a solution can be obtained explicitly, 
but with \\6\ having varying coefficients, this is not true in general. 

3 The probability of convergence can be shown to be equal to one if we 
consider the state constraints to be reflective boundary; this is a topic for a 
different paper. 




Fig. 8. A graphical representation of an omni-directional robot, showing the 
variables involved in the dynamical model j36) . 



Remark 2: Formally, q = [x,y, 9] T belongs in the two- 
dimensional special Euclidean group SE(2); it can, however, be 
embedded in R 4 150), where the usual metrics can be used. Here, 
the metric || [xi, yi, #i] T || = y 'x\ + y\ + (cos6»i - l) 2 + (sin6»i) 2 
(see |50| ) is used. 

The goal is to a find control law Ui (q(i)) to drive l |36| > to the 
origin x = y = 6 = 0, using inputs of minimal magnitude, following 
paths of minimal length, and avoiding obstacles along the way. The 
robot's workspace is a torus, containing a finite number M of torus- 
shaped obstacles at locations qj, j = 1, 2, . . . , M. The robot's outer 
workspace boundary, and those of the obstacles for i = 1, . . . , M 
are is defined as 

dS = {(x, y, 6) e R 2 x § I x 2 + y 2 = pi, V6> G §} (37a) 

dOi 4 {(x,y,8) G R 2 x S I (x-Xif + (y-y z ) 2 = p 2 .y8 £§} . 

(37b) 



Matching (36} to (23} we identify the different terms as follows: 

6(q) = [0 0] T , 
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Deterministic Path Planning: Using the metric introduces in 
Remark [2] and the definition of obstacle and outer boundary in ([37}, 
we apply the path planning approach of Section [VII-A| selecting a 
fixed R satisfying inf z6 o,t>o ||q(t) — z\\ > R > 2e > 0. 

Let us denote Q*(t) the obstacle-free continuous state trajectory 
found using, say |47|. Then the path is expressed directly as Ft = 

{7 G R 2 x § I 3t G R;7 = q*(t)}. 

Way-point Generation: Here we will select a sequence 
{7i}fl G IY, of waypoints. The objective of stochastic controller 
for each discrete state i is to make l |36| > converge e > close to 
way-point ji. 

To this end, define a set B li {e) = {q G V : ||q — 7; || < e} and 
denote its boundary <9,B 7i (e). Then define domains T>i = {(x, y, 9) G 
R 2 x § I x 2 + y 2 < R,\\(x,y,6)\\ > e,V0 G §}, and select an 
arbitrary set of N points from Ft, such that 70 = q(io), Jn = 
q*(T), and for i = 1, ... ,7V - 1, 



max {Q(a)} 

06B.. 



mm 

6S8 e . 



{Q(&)} < -f?(||7i- 



(38) 



= n dB li (e) and 
,/V. 



2e > ||7i_i - 7i|| > 2e (39) 

The boundaries Mi and Mi are defined as Mi 
Mi — &Di \Mi, respectively for all i = 1, . . 

Stochastic optimal controller: The PDE l |16| l is now written as 

£3 = in Vi (40) 

g = exp( - $(£(tat 4 ))) on Mi U Mi = dD t 

where C is an operator on functions defined as £(■) = 
!tr{d qq (-)G(q)E(q)£T(q)GT(q)}. 

Equation < |40[ > does not admit analytic solutions. Common ap- 
plicable numerical methods such as finite differences and finite 
elements have difficulty producing acceptable solutions for instances 
of problems with dimension larger than three and complex boundary 
conditions. Alternatively, the Feynman-Kac's formula (see Section 
[TV} , relates the PDE to an SDE: 
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which is essentially the unforced system q36f. Then, we know that 
the function g(q) satisfies 

ff(q)=P[6(ti)eM|6 = q] (42) 

where ti is the first exit time from the domain T>i. 

Problem instantiation and simulation results: The probability in 
(42} can be estimated numericall)]^] by simulating sufficiently many 
sample paths of \-\\\ with different initial conditions q. We produce 
these sample paths using the Euler-Maruyama method [53]. Using the 
same method, we also obtain sample paths for {36}. A 41 x 41 x 41 

4 The source code to compute function <?(q) is available at 
http://code.google.eom/p/stochastic-receding-horizon-control/ 



grid is imposed on the state space, and treating each node as an initial 
condition, we produce 500 sample paths and estimate (42}. With the 
estimate of (42}, the control law is computed numerically as 

<(q) = -E(q) E T (q) G T (q) 3 q ( - log(.g(q))) . 

Figure [5] presents two numerical approximations of g(q) in the form 
of 2D colormaps with robot orientation set at and ~ radians, 
respectively. Equipped with such a map, a numerical gradient can 
be used to calculate the control input. Figure [TU] shows a single 
sample path for the closed loop version of (36}. The time history of 
individual states x, y and 9 are shown in Figs. 1 l(a)}(7T(c) indicating 



U(d)l plots 



the convergence to an e neighborhood of the origin. Figure 
the norm of the control inputs used. Numerical data confirmed that 
the probability that the closed loop system hits every desired goal 
boundary dMi is one. 
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(a) Solution g(q) for robot orientation 8 = 




(b) Solution <?(q) for robot orientation 6 = ^ 

Fig. 9. Numerical solution g(q) of PDE (40) for stochastic system (36) for 
R = 1 and e = 0.1. 



VIII. Conclusions 

The proposed method allows the design of a receding horizon 
navigation controller for nonlinear systems governed by stochastic 
differential equations. If a feasible path, optimal or otherwise, is 
available in the form of a finite sequence of way-points, then an 
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Fig. 10. A sample path for initial condition [it, y, 8] T = [—3.0, —3.0, 1.0] T . 
Black circular dots represent two obstacles at [— 3, — 1, *] T and [—2, —2, *] T . 
The robot position is shown by a red triangle and local coordinate axis at each 
switching point. Dotted circles represent the projection of boundary Mi on 
the X-Y plane. 

an optimal control law can be found to steer the stochastic system 
between these way-points, while keeping it close to the path and 
away from unsafe regions with probability one. In cases where 
control inputs are forced within upper and lower bounds, and state 
constraints (obstacles) are imposed, almost-sure convergence and 
safety is impossible, but it can be achieved with some probability 
which depends on how severe the input bounds are compared with 
respect to the magnitude of subjected noise. For nonlinear systems 
with dynamics not permitting analytic solutions for the resulting 
PDEs, numerical solutions for dimensions up to 5 or 6 are shown to 
be well within the reach of currently available computing platforms. 
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Fig. 11. Individual trajectories and control input for the simulation presented 
in Fig. 1 1 ()| [(a)] |(b)| [(c)] Individual state trajectories converges to zero. |(d)| The 
Euclidean norm of the control input applied for the entire trajectory. 
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